48 research outputs found

    Deciphering the genetic background of quantitative traits using machine learning and bioinformatics frameworks

    Get PDF
    In dieser Doktorarbeit habe ich zwei Ansätze verfolgt, mit denen genetische Mechanismen, welche quantitativen Merkmalen zugrunde liegen, aufgezeigt und bestimmt werden können. In diesem Zusammenhang lag mein Fokus auf der Entwicklung effizienter Methoden um Genotyp-Phänotyp Assoziationen zu identifizieren. Durch diese lassen sich im Weiteren regulatorische Mechanismen beschreiben, welche phänotypische Unterschiede zwischen Individuen verursachen. Im ersten Ansatz habe ich Schlüsselmechanismen der Genregulation untersucht, welche die Entwicklung der Bruchfestigkeit von Eierschalen steuern. Das Ziel war es zeitliche Unterschiede der Signalkaskaden, welche die Eierschalen Bruchfestigkeit im Verlauf eines Vogellebens regulieren, zu detektieren. Hierfür habe ich die Bruchfestigkeit zu zwei verschiedenen Zeitpunkten innerhalb eines Produktionszyklus betrachtet und die Genotyp-Phänotyp Assoziationen mithilfe eines Random Forest-Algorithmus bestimmt. Für die Analyse der entsprechenden Gene wurde ein etablierter systembiologischer Ansatz verfolgt, mit dem genregulatorische Pathways und Master-Regulatoren identifiziert werden konnten. Meine Ergebnisse zeigen, dass einige Pathways und Master-Regulatoren (z.B. Slc22a1 und Sox11) gleichzeitig in verschiedenen Legephasen identifiziert wurden, andere (z.B. Scn11a, St8sia2 oder der TGF-beta Pathway) speziell in lediglich einer Phase gefunden wurden. Sie stellen somit altersspezifische Mechanismen dar.Insgesamt liefern meine Ergebnisse (i) signifikante Einblicke in altersspezifische und allgemeine molekulare Mechanismen, welche die Eierschalen-Bruchfestigkeit regulieren und bestimmen; und (ii) neue Zuchtziele, um die Bruchstärke von Eierschalen vor allem in späteren Legephasen zu erhöhen und somit die Eierschalen Qualität zu verbessern. In meinem zweitem Ansatz, habe ich die Methode der Random Forests mit einer Strategie zur Signaldetektierung kombiniert, um robuste Genotyp-Phänotyp-Beziehungen zu identifizieren. Ziel dieses Ansatzes war die Verbesserung der Effizienz der Einzel-SNP basierten Assoziationsanalyse. Genomweite Assoziationsstudien (GWAS) sind ein weit verbreiteter Ansatz zur Identifikation genomischer Varianten und Genen, die verantwortlich sind für Merkmale, welche von Interesse sowohl für den akademischen als auch den wirtschaftlichen Sektor sind. Trotz des langjährigen Einsatzes verschiedener GWAS-Methoden stellt die zuverlässige Identifikation von Genotyp-Phänotyp-Beziehungen noch immer eine Herausforderung für viele quantitative Merkmale dar. Dies wird hauptsächlich durch die große Anzahl genomischer Loci begründet, welche lediglich einen schwachen Effekt auf das zu untersuchende Merkmal haben. Daher lässt sich Hypothese aufstellen, dass genomische Varianten, welche zwar einen geringen, aber dennoch realen Einfluss ausüben, in vielen GWAS-Ansätzen unentdeckt bleiben. Zur Behandlung dieser Unzulänglichkeiten wird in der Arbeit ein zweistufiges Verfahren verwendet. Zunächst werden kubische Splines für Teststatistiken und genomische Regionen angepasst. Die Spline-Maxima, welche höher als die zu erwartenden zufallsbasierten Maximalwerte ausfallen, werden als quantitative Merkmals-Loci (QTL) eingestuft. Anschließend werden die SNPs in diesen QTLs, basierend auf ihrer Assoziationsstärke mit den Phänotypen, durch einen Random Forests-Ansatz priorisiert. Im Rahmen einer Fallstudie haben wir unseren Ansatz auf reale Datensätze angewendet und eine plausible Anzahl, teilweise neuartiger, genomischer Varianten und Genen identifiziert, welche verschiedenen Qualitätsmerkmalen zugrunde liegen.In this thesis, I developed two frameworks that can help highlight the genetic mechanisms underlying quantitative traits. In this regard, my focus was to design efficient methodologies to discover genotype-phenotype associations and then use these identified associations to describe the regulatory mechanism that affects the manifestation of phenotypic differences among the individuals. In the first framework, I investigated key regulatory mechanisms governing the development of eggshell strength. The aim was to highlight the temporal changes in the signaling cascades governing the dynamic eggshell strength during the life of birds. I considered chicken eggshell strength at two different time points during the egg production cycle and studied the genotype-phenotype associations by employing the Random Forest algorithm on genotypic data. For the analysis of corresponding genes, a well established systems biology approach was adopted to delineate gene regulatory pathways and master regulators underlying this important trait. My results indicate that, while some of the master regulators (Slc22a1 and Sox11) and pathways are common at different laying stages of chicken, others (e.g., Scn11a, St8sia2, or the TGF-beta pathway) represent age-specific functions. Overall, my results provide: (i) significant insights into age-specific and common molecular mechanisms underlying the regulation of eggshell strength; and (ii) new breeding targets to improve the eggshell quality during the later stages of the chicken production cycle. In my second framework, I combined the Random Forests and a signal detection strategy to identify robust genotype-phenotype associations. The objective of this framework was to improve on the efficiency of single-SNP based association analysis. Genome wide association studies (GWAS) are a well established methodology to identify genomic variants and genes that are responsible for traits of interest in all branches of the life sciences. Despite the long time this methodology has had to mature the reliable detection of genotype-phenotype associations is still a challenge for many quantitative traits mainly because of the large number of genomic loci with weak individual effects on the trait under investigation. Thus, it can be hypothesized that many genomic variants that have a small, however real, effect~remain unnoticed in many GWAS approaches. Here, we propose a two-step procedure to address this problem. In a first step, cubic splines are fitted to the test statistic values and genomic regions with spline-peaks that are higher than expected by chance are considered as quantitative trait loci (QTL). Then the SNPs in these QTLs are prioritized with respect to the strength of their association with the phenotype using a Random Forests approach. As a case study, we apply our procedure to real data sets and find trustworthy numbers of, partially novel, genomic variants and genes involved in various egg quality traits.2021-10-1

    Seasonality in Presentation of Acute Appendicitis

    Get PDF
    Background:. To assess the trends in incidence of appendicitis and pattern of variation with age, sex, and seasons of the year. Methods: In this cross-sectional  prospective study patients who underwent appendectomy for acute appendicitis were included. The demographic features, length of hospital stay, seasonal variation and post-operative outcome were assessed . The diagnosis of acute appendicitis was  established by history, examination and investigations in term of leukocyte count, urinalysis and ultrasound exam in many of these cases. In North Punjab region, the year is divided into two well-marked seasons with short transitional periods between the long hot rainless summer (May to October) and comparatively short cool winter (December to February).SPSS version 16 was used for all the statistical assessments and analysis Results: Out of 972 patients, 53% patients were males. Age range was from 5-70 years. All the patients treated surgically by open and laparoscopic means. Forty patients were found to have perforated appendix, 12 patients presented with abdominal mass and 3 patients presented with appendicular abscess. A significant seasonal effect was observed, with the rate of acute appendicitis being higher in the summer months. Conclusion: A seasonal pattern of appendicitis with a mostly predominant peak is seen during the summer months could be due to increased gastrointestinal infections in summer. The males have higher incidence of acute appendicitis with 11-20 years of age being most common age grou

    Subgraph Retrieval for Biomedical Open-Domain Question Answering: Unlocking the Knowledge Graph Embedding Power

    No full text
    Structured KG is more popular than KG; Language Models do not capture the semantic meaning of the same context with billions of parameters. While retrieving the entire Knowledge Graph is quite challenging concerning the size and memory issues. Moreover, inferring the answer to the question takes time during the reasoning on the whole KG, affecting the reasoning phase, which causes finding the incorrect solution. Pre-trained LMs have broad knowledge coverage but must perform better on structured reasoning, such as handling negation and flipped conditions. We aim to retrieve the relevant portion of the subgraph from the large KG graph. The existing subgraph retrieval solutions primarily focus on discriminative k-hop approaches or SPARQL queries on massive KGs. However, they require time-consuming, unsustainable operations in real-world contexts like biomedicine, where entities and known relationships among them are massive. They frequently rely on Named entity-linking NEL tools that fail in recognizing and mapping entities without being capable of generalizing to similar or high-order concepts. Instead, approximated search on dense representations of KGs and text can significantly boost the effectiveness and efficiency of subgraph construction with the help of enhanced generalization capabilities that overcome NEL limits and the possibility of indexing embeddings and speeding up top-K retrieval operations. In our work, we analyzed the existing methods of subgraph construction. However, they could be more efficient because of their size and quality of retrieved subgraph, which affect the reasoning process for extracting an answer to the question. Therefore, we propose the Subgraph Retrieval that tries to find the more relevant entities through linked paths (path queries) to the topic entities. The goal is to find the sequence of relations and their connected entities linked to the topic entities by measuring their similarity between them in the dense space setting

    Modeling and forecasting exchange rate dynamics in Pakistan using ARCH family of models

    Get PDF
    The main objective of this paper is to provide an exclusive understanding about the theoretical and empirical working of the GARCH class of models as well as to exploit the potential gains in modeling conditional variance, once it is confirmed that conditional mean model errors present time varying volatility. Another objective is to search the best time series model among autoregressive moving average (ARMA), autoregressive conditional heteroscedasticity (ARCH), generalized autoregressive conditional heteroscedasticity (GARCH), and exponential generalized autoregressive conditional heteroscedasticity (EGARCH) to give best prediction of exchange rates. The data used in present study consists of monthly exchange rates of Pakistan for the period ranging from July 1981 to May 2010 obtained from the State Bank of Pakistan. GARCH (1,2) is found to be best to remove the persistence in volatility while EGARCH(1,2) successfully overcome the leverage effect in the exchange rate returns under study.

    An application of safety analysis technique for thermal optimization of a paper machine

    No full text
    This investigation proposes a systematic framework based on the Fault Tree analysis, for the energy and exergy optimization of a paper machine. In the first step, energy and exergy analysis was performed on the dryer section and its related steam and condensate system, to assess the existing performance of paper machine. In the next step, checklists were developed for the identification of key safety and operational issues related to the steam and condensate system. Then, a fault tree analysis was applied to make out and resolve the root causes. Corrective actions were taken in order to rectify various problems. As a result, energy and exergy efficiencies of the system increased by 22% and 14 % respectively. Moreover, an improvement in steam and condensate recovery system was observed

    Combining Random Forests and a Signal Detection Method Leads to the Robust Detection of Genotype-Phenotype Associations

    No full text
    Genome wide association studies (GWAS) are a well established methodology to identify genomic variants and genes that are responsible for traits of interest in all branches of the life sciences. Despite the long time this methodology has had to mature the reliable detection of genotype–phenotype associations is still a challenge for many quantitative traits mainly because of the large number of genomic loci with weak individual effects on the trait under investigation. Thus, it can be hypothesized that many genomic variants that have a small, however real, effect remain unnoticed in many GWAS approaches. Here, we propose a two-step procedure to address this problem. In a first step, cubic splines are fitted to the test statistic values and genomic regions with spline-peaks that are higher than expected by chance are considered as quantitative trait loci (QTL). Then the SNPs in these QTLs are prioritized with respect to the strength of their association with the phenotype using a Random Forests approach. As a case study, we apply our procedure to real data sets and find trustworthy numbers of, partially novel, genomic variants and genes involved in various egg quality traits

    MIDESP: Mutual Information-Based Detection of Epistatic SNP Pairs for Qualitative and Quantitative Phenotypes

    No full text
    The interactions between SNPs result in a complex interplay with the phenotype, known as epistasis. The knowledge of epistasis is a crucial part of understanding genetic causes of complex traits. However, due to the enormous number of SNP pairs and their complex relationship to the phenotype, identification still remains a challenging problem. Many approaches for the detection of epistasis have been developed using mutual information (MI) as an association measure. However, these methods have mainly been restricted to case–control phenotypes and are therefore of limited applicability for quantitative traits. To overcome this limitation of MI-based methods, here, we present an MI-based novel algorithm, MIDESP, to detect epistasis between SNPs for qualitative as well as quantitative phenotypes. Moreover, by incorporating a dataset-dependent correction technique, we deal with the effect of background associations in a genotypic dataset to separate correct epistatic interaction signals from those of false positive interactions resulting from the effect of single SNP×phenotype associations. To demonstrate the effectiveness of MIDESP, we apply it on two real datasets with qualitative and quantitative phenotypes, respectively. Our results suggest that by eliminating the background associations, MIDESP can identify important genes, which play essential roles for bovine tuberculosis or the egg weight of chickens
    corecore